Automatic Identification Of Non-Compositional Multi-Word Expressions Using Latent Semantic Analysis

نویسندگان

  • Graham Katz
  • Eugenie Giesbrecht
چکیده

Making use of latent semantic analysis, we explore the hypothesis that local linguistic context can serve to identify multi-word expressions that have noncompositional meanings. We propose that vector-similarity between distribution vectors associated with an MWE as a whole and those associated with its constitutent parts can serve as a good measure of the degree to which the MWE is compositional. We present experiments that show that low (cosine) similarity does, in fact, correlate with non-compositionality.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic Identification of Non-compositional Phrases

Non-compositional expressions present a special challenge to NLP applications. We present a method for automatic identification of non-compositional expressions using their statistical properties in a text corpus. Our method is based on the hypothesis that when a phrase is non-composition, its mutual information differs significantly from the mutual informations of phrases obtained by substitut...

متن کامل

Multi-Word Expression Identification Using Sentence Surface Features

Much NLP research on Multi-Word Expressions (MWEs) focuses on the discovery of new expressions, as opposed to the identification in texts of known expressions. However, MWE identification is not trivial because many expressions allow variation in form and differ in the range of variations they allow. We show that simple rule-based baselines do not perform identification satisfactorily, and pres...

متن کامل

Determining Compositionality of Word Expressions Using Word Space Models

This research focuses on determining semantic compositionality of word expressions using word space models (WSMs). We discuss previous works employing WSMs and present differences in the proposed approaches which include types of WSMs, corpora, preprocessing techniques, methods for determining compositionality, and evaluation testbeds. We also present results of our own approach for determining...

متن کامل

Multiword expressions: hard going or plain sailing?

Over the past two decades or so, Multi-Word Expressions (MWEs; also called Multi-word Units) have been an increasingly important concern for Computational Linguistics and Natural Language Processing (NLP). The term MWE has been used to refer to various types of linguistic units and expressions, including idioms, noun compounds, phrasal verbs, light verbs and other habitual collocations. However...

متن کامل

Presentation of an efficient automatic short answer grading model based on combination of pseudo relevance feedback and semantic relatedness measures

Automatic short answer grading (ASAG) is the automated process of assessing answers based on natural language using computation methods and machine learning algorithms. Development of large-scale smart education systems on one hand and the importance of assessment as a key factor in the learning process and its confronted challenges, on the other hand, have significantly increased the need for ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006